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Abstract. Drawing large graphs appropriately is an important step for the visual 
analysis of data from real-world networks. Here we present a novel multilevel 
algorithm to compute a graph layout with respect to a recently proposed metric 
that combines layout stress and entropy. As opposed to previous work, we do 
not solve the linear systems of the maxent-stress metric with a typical numerical 
solver. Instead we use a simple local iterative scheme within a multilevel ap¬ 
proach. To accelerate local optimization, we approximate long-range forces and 
use shared-memory parallelism. Our experiments validate the high potential of 
our approach, which is particularly appealing for dynamic graphs. In comparison 
to the previously best maxent-stress optimizer, which is sequential, our parallel 
implementation is on average 30 times faster already for static graphs (and still 
faster if executed on one thread) while producing a comparable solution quality. 


1 Introduction 


Drawing large networks (or graphs, we use both terms interchangeably) with hundreds 
of thousands of nodes and edges has a variety of relevant applications. One of them 
can be interactive visualization, which helps humans working on graph data to gain 
insights about the properties of the data. If a very large high-end display is not available 
for such purpose, a hierarchical approach allows the user to select an appropriate zoom 
level m. Moreover, drawings of large graphs can also be used as a preprocessing step 
in high-performance applications IS). 

One very promising class of layout algorithms in this context is based on the stress 
of a graph. Such algorithms can for instance be used for drawing graphs with fixed dis¬ 
tances between vertex pairs, provided a priori in a distance matrix Q. More recently, 
Gansner et al. Q proposed a similar model that includes besides the stress an additional 
entropy term (hence its name maxent-stress). While still using shortest path distances, 
this model often results in more satisfactory layouts for large networks. The optimiza¬ 
tion problem can be cast as solving Laplacian linear systems successively. Since each 
right-hand side in this succession depends on the previous solution, many linear systems 


need to be solved until convergence - more details can be found in Section 2.3 


Motivation. We want to employ this maxent-stress model for drawing large net¬ 
works quickly. Yet, solving many large Laplacian linear systems can be quite costly. A 
conjugate gradient solver (used in ID) is easy to implement but has superlinear running 
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Fig. 1. Drawings of bcsstkSl. Left to right: PivotMDS (U, Maxent 01, MulMent (new). 

time. Solvers with provably nearly-linear running time exist but are not yet competitive 
with established methods in practice (see a for an experimental comparison). Multi¬ 
grid methods EQ for Laplacian systems may seem appealing in this context, but their 
setup phase building the multigrid hierarchy can be expensive for large graphs. 

Gansner et al. Ill also suggested (but did not use) a simpler iterative refinement 
procedure for solving their optimization problem. This procedure would be slow to 
converge if used unmodified. However, if designed and implemented appropriately, it 
has the potential for fast convergence even on large graphs. Moreover, as already ob¬ 
served in a, it has high potential for parallelism and should work well on dynamic 
graphs by profiting from previous solutions. 

Outline and Contribution. The main contribution of this paper is to make the alter¬ 
native iterative local optimizer suggested by Gansner et al. a (for details on this and 
other related work see Section]^ usable and fast in practice. To this end, we design 
and implement a multilevel algorithm tailored to large networks (see Section]^. The 
employed coarsening algorithm for building the multilevel hierarchy can control the 
trade-off between the number of hierarchy levels and convergence speed of the local 
optimizer. One property of the local optimizer we exploit is its high degree of paral¬ 
lelism. Further acceleration is obtained by approximating long-range forces. To this 
end, we use coarser representatives stored in the multilevel hierarchy. 

Our experimental results in Section|^first reveal that force approximation rarely af¬ 
fects the layout quality significantly - in terms of maxent-stress values as well as visual 
quality, also see Figure and Appendix. The parallel implementation of our multilevel 
algorithm MulMent with force approximation is, however, on average 30 times faster 
than the reference implementation iH - and even our sequential approximate algorithm 
is faster than the reference. A contribution besides higher speed is that, in contrast to lH, 
our approach does not require input coordinates to optimize the maxent-stress measure. 




2 Preliminaries 

2.1 Basic Concepts 

Consider an undirected, connected graph G = {V,E,c,uj,d) with node weights c : 
V —>■ K>o, edge weights oj : E ^ R>o. target edge lengths d : E ^ K>o> = |y|, 
and m = \E\. Often the function d models the required distance between two adjacent 
vertices. By default, our initial inputs will have unit edge length d = 1 as well as unit 
node weights and edge weights c = 1, a; = 1. However, we will encounter weighted 
problems in the course of our multilevel algorithm. Let N{v) := {u : {u, u} S E} 
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denote the set of neighbors of v. A clustering of a graph is a set of blocks (= clusters) 
of nodes {Vi,..., 14} that partition V, i.e., 14 U • • • U 14 = 1^ and 1} n V} =0 for 
i ^ j. A layout of a graph is represented as a coordinate vector x, where is the two- 
dimensional coordinate of vertex v. Since edges are drawn as straight-line segments 
between their incident nodes, x is sufficient to define the complete graph layout. 

2.2 Related Work 

Most general-purpose layout algorithms for arbitrary undirected graphs are based on 
physical analogies and can be grouped, according to Hu and Shi ||9l, into two main 
classes; algorithms in the spring-electrical model and algorithms in the stress model. 
Both classes of algorithms often yield aesthetically pleasing graph layouts that empha¬ 
size symmetries and avoid edge crossings at least in sparse graphs. Recent surveys of 
algorithms in these models are given by Hu and Shi ||9l and by Kobourov IfTOll . 

In the spring-electrical model, first presented by Eades in 1984 m, the analogy 
is to represent nodes as electrically charged particles that repel each other while edges 
are represented as springs exerting attraction forces to adjacent nodes. A graph layout 
is then seen as a physical system of forces and the goal is to find an optimal layout 
corresponding to a minimum energy state. Spring-electrical algorithms are also known 
as spring embedders, with the algorithm by Fruchterman and Reingold ini being one 
of the most widely used spring embedder algorithms. It simulates the physical system 
of attractive and repulsive forces and iteratively moves each node into the direction 
of the resulting force. Each iteration requires, however, a quadratic number of force 
computations due to the repulsive forces between all pairs of nodes, which limits the 
scalability of the original approach. A faster approximative force calculation method 
based on quadtrees, aggregating especially the long-range forces, has been proposed by 
Barnes and Hut lfT3l and yields running times of 0{n log n) under certain assumptions. 

The (full) stress model is closely related to multidimensional scaling Qa, and was 
introduced in graph drawing by Kamada and Kawai ca. It is based on defining ideal 
distances d^v not only between adjacent vertices but between all vertex pairs (u, v) G 
VxV and then minimizing the layout stress t>Juvi\\xu — Xv\ \ — duv)'^, where Wuv 

is a weight factor typically chosen as Wuv = 1/'^™- Often, the distance d^v between 
adjacent nodes is set to 1, while the distance of non-adjacent nodes is the shortest- 
path distance in the graph. Solving this model is typically done by iteratively solving a 
series of linear systems 0. The need to compute all-pairs shortest paths and to store a 
quadratic number of distances again defeats the scalability of this original approach for 
large graphs. One of the fastest algorithms for approximatively solving the stress model 
instead is PivotMDS which requires distance calculations from each vertex only to 
a small set of k <^n suitably chosen pivot vertices. 

The stress model prescribes target distances not only for edges but for all vertex 
pairs. While this is a reasonable approach, it still brings artificial information into the 
layout process. An interesting alternative has been proposed by Gansner et al. |4l. Their 
algorithm (called Maxent) uses the sparse stress model, which only contains the stress 
terms for the edges of the graph. In order to deal with the remaining degrees of freedom 
in the layout, they suggest using the maximum entropy principle instead. Since our 
algorithm is closely related to Maxent, we discuss the latter in more detail in Section[Z3| 
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A general approach for speeding up layout computations for large graphs is the 
multilevel technique, which has been used in the spring-electrical II16I17II and in the 
stress model d. A multilevel algorithm computes a sequence of increasingly coarse 
but structurally related graphs as abstractions of the original graph. Starting from a 
layout of the coarsest graph, incremental refinement steps using the previous layout as 
a scaffold eventually produce a layout of the entire input graph, where the refinement 
steps are fast due to the good initial layouts. 

In addition to sequential algorithms for drawing large graphs, there is previous 
research in parallel layout algorithms, particularly using a graphics processing unit 
(GPU). Frishman and Tal 1(191 presented a multilevel force-based layout algorithm and 
implemented it using GPU-based parallelization. Ingram et al. Il20l also exploit parallel 
GPU computations and presented a multilevel stress-based layout algorithm. 


2.3 Maxent-stress optimization 

Gansner et al. ||4| proposed the maxent-stress model that combines a sparse stress model 
with an entropy term to resolve the degrees of freedom for non-adjacent vertex pairs. 
The entropy term itself is optimized when all nodes are spread out uniformly, similar 
to the repulsive forces in the spring-electrical model. Gansner et al. showed that the 
maxent-stress model performs well on several measures of layout quality in distance- 
based embeddings and avoids typical shortcomings of other stress models, particularly 
for non-rigid graphs. Formally, the maxent-stress M(x) of a layout x is definecj^as 

Af (x) ^ ^ Wuy ( I \ Xu Xy I I duV ) ^ ^ ^ ||Xu (1) 


where duv is the target distance between nodes u and v and Wuv is a weight factor 
typically chosen as Throughout the paper, we use this as a weight factor. 

The scaling factor a is used to modulate the strength of the entropy term and is gradually 
reduced in the implementation. 

Gansner et al. minimize the maxent-stress using a technique that repeatedly solves 
Laplacian linear systems that additionally include a repulsive force vector which is 
approximated following the quadtree method of Barnes and Hut IfTSlI . 

Alternatively, they proposed (but did not implement) the following local iterative 
force-based scheme to solve the maxent-stress model; 
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where = X]{u y}eE Note that sometimes we use the abbreviation r{u, v) := 
\\x'^-x''p shortly call these values r-values. 

' In fact, Gansner et al. define a slightly more general model that considers the stress term for 
arbitrary supersets S ^ E and allows variations of the entropy term. Our algorithm also works 
for the general model; to simplify the description, we restrict ourselves to the default model. 
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3 Multilevel Maxent-stress Optimization 


As mentioned, a successful (meta)heuristic for graph drawing (and other optimization 
problems on large graphs) is the multilevel approach. We employ it for maxent-stress 
optimization also for other reasons: (i) Some graphs (such as road networks) feature a 
hierarchical structure, which can be exploited to some extent by a multilevel approach 
and (ii) the computed hierarchy may be useful later on for multiscale visualization. 

Before going into the details, we briefly sketch our algorithmic approach: The 
method for creating the graph hierarchy is based on fast graph clustering with con¬ 
trollable cluster sizes. Each cluster computed on one hierarchy level is contracted into 
a new supervertex for the next level. After computing an initial layout on the coars¬ 
est hierarchy level, we improve the drawing on each finer level by iterating Eq. 0. 
Additionally, this process exploits the hierarchy and draws vertices that are densely 
connected with each other (i. e. which are in the same cluster) close to each other. 

3.1 Coarsening and Initial Layout 

To compute the clustering that is contracted in the manner described above, we adapt 
size-constrained label propagation (SCLaP) 11211 . an algorithm originally developed for 
coarsening and local improvement during multilevel graph partitioning. SCLaP itself is 
based on the graph clustering algorithm label propagation ll22l . The latter starts with a 
singleton clustering (i. e. each node is a cluster). The algorithm then works in rounds. 
Roughly speaking, in each round the algorithm visits all nodes in random order and 
assigns each node to the predominant cluster in its neighborhood (illustrated in Pigure|^ 
in the appendix). This way, cluster IDs (= labels) propagate through the graph and nodes 
in a dense cluster usually agree on a common label. 

However, clusters with unconstrained sizes are not desirable here since they would 
hamper convergence of the local improvement phase. The trade-off between this con¬ 
vergence speed and the number of hierarchy levels needs to be chosen properly for a 
fast overall running time. That is why SCLaP constrains cluster sizes, i. e. it introduces 
an upper bound U := max(max„ c(u), W) on the cluster sizes (W is specified below). 
Consequently, in each SCLaP round, nodes are assigned to the predominant cluster that 
is not overloaded after the label change. 

In our implementation, based on preliminary experiments, we set the parameter W 
to min(6^, ^), where b and / are tuning parameters and h is the level in the hierarchy 
that we are currently working on. The intuition behind this choice is that we want the 
contraction process not to be too strong on the fine levels in order two allow fast con¬ 
vergence of local improvement algorithms, whereas we allow stronger contractions on 
coarser levels. If the contracted graph is not more than 10% smaller than the graph on 
the current level, we decrease the value of / and set it to 0.7/. 

While the original label propagation algorithm repeats the process until conver¬ 
gence, SCLaP performs at most £ rounds, where £ is a tuning parameter. One round of 
the algorithm can be implemented to run mO{n + m) time. 

Contracting a clustering works as follows: each block of the clustering is contracted 
into a single node. The weight of the node is set to the sum of the weight of all nodes 
in the original block. There is an edge between two nodes u' and v' in the contracted 
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graph if the two corresponding blocks in the clustering are adjacent to each other in G, 
i. e. block u' and block v' are connected by at least one edge. The weight of an edge 
{u', v') is set to the sum of the weight of edges that run between block u' and block v' 
of the clustering. An example contraction is shown in Figure|^in the appendix. 

Initial Layout. The process of computing a size-constrained clustering and contracting 
it is repeated recursively. Then an initial layout is drawn, meaning that each of the two 
nodes of the coarsest graph is assigned to a position. We place the vertices such that the 
distance is optimal. The optimal distance of the two vertices is defined and motivated 
in the next section. 

3.2 Uncoarsening and Local Improvement 

When the initial layout has been computed, the solution is successively prolongated to 
the next finer level, where a local maxent-stress minimizer is used to improve the layout. 
For undoing the contraction, nodes that have been in a cluster are drawn at a random 
position around the location of its coarse representative. More precisely, let u be a (fine) 
vertex that is represented by the coarse supervertex u' at P = (x, y). We place u at a 
random position in a circle around P with radius r := yjc{v'). We do this by picking an 
angle uniformly at random in [0, 27r] and a distance to P uniformly at random in [0, r]. 
These two values are then used as a polar coordinate for v with respect to the origin P. 

Local Improvement. Our local improvement tries to minimize the maxent-stress on each 
level of the hierarchy based on Eq. (|^. Note, however, that simply iterating Eq. (|^ on 
each level is not sensible since coarse vertices represent a multitude of vertices. These 
vertices need space to be drawn on the next finer level. Now let u and v be two vertices 
on the same fixed level. We adjust distances duv on the current level in the hierarchy 
under consideration to c{u) -f c{v) with the intuition that vertices represented by 
u should be drawn in a circle around u with radius yjc{u) (similarly for v). 

As Gansneret al. H , we adjust the value of a in Eq. Q during the process. Since we 
want to approximate the maxent-stress, the value should be small. However, it cannot 
be too small initially since one would only solve a sparse stress model in this case. 
Hence, following Gansner et al. a, we set a to one initially and gradually reduce it by 
a := 0.3 • a until amin = 0.008 is reached. 

We call a single update step of the coordinates of all vertices using Eq. Q an itera¬ 
tion. Multiple iterations with the same value of a are called round. The current iteration 
uses the coordinates that have been computed in the previous iteration. We perform at 
most a iterations with the same value of a in one round. Then we reduce a as described 
above. If the relative change — a;^||/||a;^|| in the layout is smaller than some 

threshold e, we directly reduce the value of a and continue with the next round. 

Faster Local Improvement. The local optimization algorithm presented above has a 
theoretical running time of 0{n^) per iteration. To speed this up, one can use approxi¬ 
mations for the distances in the entropy term in Eq. Q. We do this by taking the cluster 
structure computed during coarsening into account: Let Vi U... U14 be the correspond¬ 
ing clustering and M : V —> U' = {1,... , fc} be the mapping that maps a node v G V 
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to its coarse representative. The first term in Eq. is computed as before and the sec¬ 
ond term is approximated by using the coordinates of the corresponding coarse vertex. 
As formula the second term written without the multiplicative factor becomes 



(3) 


M(u) = M(v) 




where x' maps a coarse vertex to its coordinates and i'{v') is the number of nodes that 
the coarse vertex represents on the current finer level. Roughly speaking, we reduced 
the necessary amount of computation to add up the values of r by summing up the 
correct values of r for all vertices that are in a sense close and using approximations for 
vertices that are far away. In our context, a vertex is close if it is in the same cluster as the 
currently processed vertex. If a vertex is not close, we use the coordinate of its coarse 
representative instead. We avoid unnecessary computation by scaling the approximated 
value of r with the number ^{v') of vertices it represents and adding approximated 
value of r only once. The last term in Eq. (j^ subtracts values of r for {u, v} G E that 
have been added in good faith in the first two summations. 

Note that if M is the identity, then the term in Eq. ([^ is the same as in the original 
Eq. 1^ In this case the first two summations add up the r-values for all pairs of vertices 
and the last sum subtracts the r-values for pairs that are in E. 

After the update of the vertices on the current level, we update the coordinates of the 
vertices on the coarser level used for approximation. We set the coordinate of a vertex 
v' on the coarser level to the weighted midpoint of the vertices represented by v'. 

Note that one obtains even faster algorithms by using a coarser version of the graph 
that is multiple levels beneath the current level in the graph hierarchy. That means in¬ 
stead of using the next coarser graph, we use the contracted graph which is h > 1 
levels beneath the current graph in the hierarchy - if there is such. Otherwise, we use 
the coarsest graph in the hierarchy. Obviously this yields a trade-off between solution 
quality and running time. Also note that this introduces an additional error. To see this, 
let the coarser vertices that have the same coarse representative on the level used for 
approximating values of r be called Ad-vertices (merged vertices). Now, for a vertex on 
the current level, the r-values of Ad-vertices are not accounted for in Eq. 0. Hence, 
we look at the parameter h carefully in Section and evaluate its impact on running 
time and solution quality. We call our algorithms MulMent and denote by MulMent^i 
the algorithm that uses an /i-level approximation of the r-values. With h if h = 0 we 
denote the quadratic-time algorithm. A rough analysis in Appendix [Bjyields : 

Proposition 1. Under the assumption of equal cluster sizes, the running time of one 

h + 2 

iteration of algorithm MulMent h, h > 0, is 0[m -\- n'*+i), respectively. 

Properly implemented, multilevel algorithms lead to fast convergence of their local 
optimizers. Moreover, the overall work performed by the multilevel approach is only a 
constant factor times the one on the finest level. This leads us to the initial appraisal that 
the same asymptotic running times may hold for the respective complete algorithms. 
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Shared-memory Parallelization. Our shared-memory parallelization of an iteration of 
the local optimizer uses OpenMP and works as follows: Since new coordinates of the 
vertices in the same iteration can be computed independently, we use multiple threads 
to do so. The relative change in the layout — x^||/||a:^|| can be computed in 

parallel using a reduce operation. Parallelism is also used analogously when working 
on different levels for the distance approximations in the entropy term. Other parts of the 
overall algorithm could potentially be parallelized, too - such as coarsening. However, 
already on medium sized graphs coarsening consumes less than 5% of the algorithm’s 
overall running time. Moreover, the relative running time of coarsening decreases even 
more with increasing graph size so that the effort does not seem worth it. 


4 Experimental Evaluation 

Methodology. We implementecQthe algorithm described above using C-H-. Paralleliza¬ 
tion of our algorithm has been done using OpenMP. We compiled our programs using 
g-H- 4.9 -03 and OpenMP 3.1. Executables for PivotMDS (PMDS) 0 and MaxEnt 
(GHN, for clarity we use the author names as acronym) 0 have been kindly provided 
by Yifan Hu. When comparing layouts computed by different algorithms, we evaluate 
two metrics. The first metric is the full stress measure, F{x) = vgv ''J^uv{\\xu — 
;Cii|| — duvY, and the second one is the maxent-stress function M{x) as defined in 
Eq. ([T]) at the final penalty level of a = 0.008. The latter is of primary importance since 
that is what GHN and MulMent optimize for. The implementations PMDS and GHN 
sometimes compute vertices that are on the same position. Hence, we add small random 
noise to the coordinates of these layouts in order to be able to compute the maxent- 
stress. More precisely, for each of the components of the 2D-coordinate of a node, we 
randomly add or subtract a random value from the interval [10“^, 10“^]. This changes 
the full stress measure by less than 10“^ percent on average. We follow the methodol¬ 
ogy of Gansner et al. 0 and scale the layout of all algorithms to minimize the stress to 
be fair to all methods: We find a scalar s such that vev U}uvis\\xu — a:„|| — duv)'^ 
is minimized for a given layout x. 

Machine. Our machine has four Octa-Core Intel Xeon E5-4640 (Sandy Bridge) pro¬ 
cessors (32 cores, 64 with hyperthreading active) which run at a clock speed of 2.4 
GHz. It has 512 GB local memory, 20 MB L3-Cache and 8x256 KB L2-Cache. Unless 
otherwise mentioned, our algorithms use all 64 cores (hyperthreading) of that machine. 
Since PMDS and GHN are sequential algorithms, they use one core of that machine. 


Algorithm Configuration. After an extensive evaluation of the parameters, we fixed 
the cluster coarsening parameters / to 20 and b to 2. The initial value of the penalty 
parameter a is set to 1. We perform at most a = 2 iterations with the same value of a, 
while it has not reached its minimum value of 0.008. When it has reached its minimum 
value, we iterate until the relative error \\x^^^ — cc^ll/llx^ll is smaller than 0.0001. Yet, 
our experiments indicate that our algorithm is not very sensitive about the choice of 


these parameters. We evaluate the influence of the approximation level h in Section 4.1 


' We will release the implementation of our algorithms as open source. 
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Instances. We use the instances 1138_bus, USpowerGrid, bcsstk31, commanche 
and luxembourg employed in lU and extend the set to include larger instances. We 
excluded the graphs gd, qh882 and Ip_ship04l from ||4| from our experiments since 
the graphs are either not undirected or the corresponding matrix is rectangular. Most of 
the instances taken from Q are available at the Florida Sparse Matrix Collection ||23]| . 
The graphs 3elt, bcsstk31, fe_pwt and autO are available at the Walshaw benchmark 
archive El. The graphs deIX are Delaunay triangulations of 2^ random points in the 
unit square ll^ . Moreover, the graphs nyc and luxembourg are road networks. These 
graphs have been taken from the benchmark set of the 9 th and 10th DIM ACS Imple¬ 
mentation Challenge 0261271 . A summary of the basic properties of these instances can 
be found in Table in the appendix. In any case, we draw the largest connected com¬ 
ponent if the graph has more than one. We assume unit length distance for all graphs. 

4.1 Influence of Coarse Graph Approximation and Scalability 

In this section, we investigate the influence of the parameter h on layout quality and 
running time (algorithmic speedup) as well as the scalability of our algorithms with 
varying number of threads (parallel speedup). We perform detailed experiments on our 
medium sized networks (using 64 threads) and present parallel speedups on the largest 
graphs auto and del20. We report absolute running times and parallel speedups for the 
graphs auto and del20 in Figures |^and|^and present detailed data for the medium size 
networks in Table in the appendix. We do not report layout quality metrics for autO 
and del20 since the size of the network makes it infeasible to compute them and the 
result of the algorithm is independent of the number of threads used. 

We now investigate the influence of the parameter h. In general, the larger the graphs 
get, the larger the algorithmic speedups obtained with increasing h. On the smallest 
graph in this collection, we obtain an algorithmic speedup of about 3 with h = 6 
(fe_pwt) over MulMento. On the largest two instances in this section, we obtain an 
algorithmic speedup of 30 with h = 9 (auto) and of 122 with h = 10 (del20). In addi¬ 
tion, the precise choice of the parameter does not seem to have a very large impact on 
solution quality on these graphs. This is also due to the size of the networks. As appar¬ 
ent from Tables[^and[^in the appendix, the graphs on which full stress measure slightly 
increases are luxembourg and bcsstk31 (7% and 15% respectively). The metric actually 
under consideration, maxent-stress, always remains comparable. On all instances under 
consideration, we observe a locally optimal value for h in terms of running time. It is 
around seven and seems to get larger with increasing graph size. This is due to the fact 
that too large values of h provide less precision and slower convergence. 

On del20, the scalability with the number of threads is almost perfect for small 
values of h. With enabled hyperthreading, we achieve slightly superlinear speedups 
for MulMento. As less work has to be done for increasing h, speedups get smaller. 
The smallest speedup on this graph has been observed for MulMentio. In this case, we 
achieve a speedup of 11.5 using 64 threads over MulMentio using one thread. With 
even larger h speedups increase again. The parallel scalability on autO is similar. 

Another interesting way to look at the data is the overall speedup - algorithmic 
and parallel speedup combined - achieved over MulMento using only one thread. The 
largest overall speedup is obtained by MulMentio using 64 threads. In this case, the 
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Fig. 2. Running times and parallel speedups of our algorithms on del20. 

overall speedup is larger than 4000 - reducing the running time of the algorithm from 
30 hours to 27 seconds. Speedups over PMDS and GHN are found in the next section. 

4.2 Comparison to other Drawing Algorithms 

We now compare MulMent to the two implementations PMDS lUl and GHN Q. We do 
this on all networks but only report quality metrics for small and medium sized graphs 
since it is infeasible to compute quality metrics for the large graphs. We report detailed 
data in Tables and in the appendix. 

Most importantly, although MulMent sometimes performs a few percent worse than 
GHN, the maxent-stress of all layouts is more or less similar. PMDS performs slightly 
worse in this metric. Intriguingly, the alternative full stress metric is consistently better 
on small networks for MulMent than the results obtained by PMDS (except for h = 
10). On the other hand, full stress obtained by our algorithms is comparable to the 
layout computed by GHN on four out of nine instances. On the three largest medium 
sized networks, we obtain worse full stress than PMDS and GHN. However, this is not 
astonishing since our algorithm does not optimize for full stress - in contrast to PMDS. 
And GHN at least starts with a PMDS solution and improves maxent-stress afterwards. 

Our implementations of MulMent 7 lo are always faster than GHN, both of them a 
factor 30 on average. Also, MulMenty lo outperform even PMDS in terms of running 
time as soon as the graphs get large enough (medium and large sized graphs). On the 
large graphs, MulMentig is a factor of 2 to 3 faster than PMDS and a factor of 32 to 
63 faster than GHN. In addition, MulMenty lo are also several times faster than GHN 
when using one thread only (see Appendix Table|^. 

4.3 Dynamic Networks 

One of the main advantages of the iterative scheme is its ability to use an existing layout 
for computing a new one, e. g. for a graph that has changed over time. We perform 
experiments with dynamic graphs obtained by modifying our medium sized networks. 
Often one is interested in drawing graphs that have more or less good locality. Hence, 
we define a random model that modihes the edges of a graph by removing random 
edges and inserting edges between vertices that are not too far apart. 

To be more precise, we start with an input graph G and perform a breadth hrst 
search from a random start node to compute a random spanning tree. We then remove 
x% undirected non-tree edges at random in the beginning. Note that this ensures that 
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the graph stays connected. Afterwards, we insert x% new edges as follows. We pick a 
random node and insert an undirected edge to a random node that has distance 1 < d < 
T) in the original graph G, where I? is a tuning parameter. We denote the graph that 
results out of this process as Q. 

We compute two layouts of Q. The first one updates coordinates given by an initial 
layout of G (update algorithm). The second layout is computed by our algorithm from 
scratch (scratch algorithm), i. e. discarding the initial layout. In the first case, we start 
directly at the penalty level a = 0.008 and only update coordinates on the finest level of 
the hierarchy. We compute the graph hierarchy as before but stop the coarsening process 
after the computation of h levels. Coordinates of the vertices on the approximation level 
are set to the middle point of the vertices in the corresponding cluster initially. 

We vary x € {1,5}, V S {2,16} and h € {0,7}, and present detailed data in 
Table l^in the appendix. As expected, the running time of the update algorithm (fdyn) is 
always smaller than the running time of the scratch algorithm (fscratch)- As MulMent 7 
performs less work than MulMento, algorithmic speedups are always larger for the 
latter. For h = 0, the update algorithm is a factor of 4 faster than the scratch algorithm 
on average. On the other hand, for h = 7 the update algorithm saves about 50% time on 
average over the scratch algorithm. Solution quality is not influenced much. On average, 
the full stress measure of the update algorithm is 9% larger and maxent-stress improves 
by 1% compared to the scratch algorithm. The increase in full stress is mostly due to 
the Delaunay instance and V = 16, in which the full stress of the layout of the update 
algorithm is a factor of two larger. The algorithmic speedup does not seem to be largely 
influenced by V. However, we expect that much larger values of T) will decrease the 
speedup of the update algorithm over the scratch algorithm. 

5 Conclusions 

We have presented a new multilevel algorithm for iteratively and approximatively opti¬ 
mizing the maxent-stress model, a model proposed by Gansner et al. to avoid typical 
pitfalls of other stress models. From the experimental evaluation we conclude that our 
parallel algorithm produces layouts with similar visual quality and maxent-stress values 
as the reference implementation Q. At the same time it is on average 30 times faster, 
even more for dynamic graphs. Moreover, our algorithm is even up to twice as fast as 
the fastest stress-based algorithm PivotMDS JS). It thus combines the high speed of 
PivotMDS with the high visual quality of Maxent in a single algorithm, at least if a 
multicore system is available. 

Acknowledgements. Financial support by DFG is acknowledged (DFG grants ME 
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A Additional Figures 



Fig. 3. An example round of the label propagation graph clustering algorithm. Initially each node 
is in its own block. The algorithm scans all vertices in a random order and moves a node to the 
block with the strongest connection in its neighborhood. 




Fig. 4. Contraction of a clustering (colors indicate cluster labels). Each cluster of the graph on the 
left corresponds to a node (= supervertex) in the graph on the right. 




Fig. 5. Running times and parallel speedups of our algorithms on autO. 
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B Running Time Proof Sketch 


We start with sketching the running time on the hnest level of the hierarchy for h = 
1. To simplify the analysis of our algorithm, we assume that our clustering algorithm 
always computes clusters of equal size, i.e. each cluster contains exactly c vertices. In 
this (admittedly not overly realistic case), there are n/c clusters. It is easy to see that 
we need 0{d(v) + c + n/c) time to evaluate Eq. ([^ for a vertex v. The optimal value 
for c is ^/n and can be found by simple analysis. Summing over all vertices, yields 
0{m + overall time per iteration. 

If we use the graph that is h levels beneath the current level for approximating r 

values, we need 0{d{v) + c + n/c^) time to compute the new coordinate for a vertex v. 

Here we assume again that each cluster in the hierarchy contains c vertices. As before, 

1 

a simple analysis yields that the optimal value for c is n'*+i. Summing over all vertices, 

h + 2 

yields 0{m + n'*+i) overall time. 

C Basic Properties of the Benchmark Set 


Table 1. Basic properties of the benchmark set with a rough type classification. 


graph 

n 

m 

Type II Ref. 

Small Graphs 

btree 

1023 

1022 

Binary Tree 

Ea 

1138_bus 

1 138 

1358 

Power System 

Ea 

USpowerGrid 

4941 

6 594 

US Power Grid 

na 

Belt 

4960 

13 722 

Airfoil 

ED 

commanche 

7 920 

11880 

Helicopter 

Ea 

Medium Graphs 

bcsstkSl 

35 586 

572 913 

Automobile Component 

E4l 

fe_pwt 

36519 

144794 

Structural Problem 

EH 

del 16 

65 536 

196 575 

Delaunay Triangulation 

Ea 

luxembourg 

114 599 

119 666 

Road Network 

Ea 

Large Graphs 

nyc 

264 346 

365 050 

Road Network 

m 

auto 

448 695 

3314611 

Automobile 

m 

del20 

1 048 576 

3 145 686 

Delaunay Triangulation 

Ea 
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D Detailed Experimental Results 


h 

graph 

F{x) 

M{x) 

its] 

~0 

bcsstk31 

31507K 

-14 925K 

5.69 

1 

bcsstk31 

31547K 

-14921K 

4.16 

2 

bcsstkSl 

31 887K 

-14931K 

2.77 

3 

bcsstk31 

30 938K 

-14 935K 

2.28 

4 

bcsstk31 

31449K 

-14 933K 

2.01 

5 

bcsstk31 

31819K 

-14921K 

1.88 

6 

bcsstk31 

31 894K 

-14919K 

1.81 

7 

bcsstk31 

32156K 

-14912K 

1.82 

8 

bcsstk31 

33 574K 

-14 888K 

1.88 

9 

bcsstk31 

34 306K 

-14 877K 

1.86 

10 

bcsstkSl 

35 316K 

-14861K 

1.86 

11 

bcsstkSl 

36086K 

-14 850K 

2.09 

“o 

fe_pwt 

51501K 

-23 850K 

4.44 

1 

fe_pwt 

50329K 

-23 867K 

2.64 

2 

fe_pwt 

50926K 

-23 855K 

1.67 

3 

fe_pwt 

50317K 

-23 864K 

1.11 

4 

fe_pwt 

49 620K 

-23 874K 

0.88 

5 

fe_pwt 

50420K 

-23 861K 

0.77 

6 

fe_pwt 

50927K 

-23 852K 

0.69 

7 

fe_pwt 

50 885K 

-23 850K 

0.59 

8 

fe_pwt 

49 889K 

-23 862K 

0.68 

9 

fe_pwt 

50174K 

-23 857K 

0.63 

10 

fe_pwt 

49464K 

-23 867K 

0.74 

11 

fe_pwt 

48 727K 

-23 879K 

0.80 

“o 

del 16 

716 997K 

-58 137K 

13.76 

1 

del 16 

716391K 

-58146K 

7.71 

2 

del 16 

724 073K 

-58031K 

4.40 

3 

del 16 

727 549K 

-57 980K 

2.69 

4 

del 16 

728 283K 

-57 968K 

1.95 

5 

del 16 

729 923K 

-57 940K 

1.47 

6 

del 16 

733 832K 

-57 884K 

1.14 

7 

del 16 

729503K 

-57 948K 

1.17 

8 

del 16 

725 548K 

-58 005K 

0.96 

9 

del 16 

722181K 

-58052K 

0.99 

10 

del 16 

718 768K 

-58097K 

1.20 

11 

del 16 

715 560K 

-58 138K 

1.41 

“o 

luxembourg 

503465K 

-310 576K 

41.02 

1 

luxembourg 

502 360K 

-310585K 

22.86 

2 

luxembourg 

501632K 

-310615K 

12.64 

3 

luxembourg 

502 340K 

-310 596K 

6.85 

4 

luxembourg 

498 744K 

-310659K 

4.04 

5 

luxembourg 

505 175K 

-310 546K 

2.46 

6 

luxembourg 

518 992K 

-310 370K 

1.73 

7 

luxembourg 

521771K 

-310331K 

1.35 

8 

luxembourg 

528 333K 

-310215K 

1.24 

9 

luxembourg 

534015K 

-310148K 

1.25 

10 

luxembourg 

537 286K 

-310106K 

1.48 

11 

luxembourg 

539427K 

-310072K 

2.01 


Table 2. Influence of parameter h. Smaller values are better. F measures full stress, M measures 
maxent-stress. Values marked with a K are given in thousands. 
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graph 

PMDS 

GHN 

MulMento 

MulMenti 

MulMent 2 

MulMentz 

MulMent 10 

btree 

-7 231 

-9 688 

-9128 

-9 086 

-9102 

-9110 

-9134 

1138_bus 

-10973 

-11917 

-11312 

-11280 

-11223 

-11236 

-11226 

USpowerG 

-260708 

-265 352 

-262664 

-262597 

-262411 

-262067 

-260938 

3elt 

-276 808 

-280 535 

-280004 

-280154 

-280169 

-279721 

-276 893 

commanche 

-950570 

-954 624 

-962521 

-962 851 

-963 107 

-956300 

-956237 

bcsstk31 

-15M 

-15M 

-15M 

-15M 

-15M 

-15M 

-15M 

fe_pwt 

-24M 

-24M 

-24M 

-24M 

-24M 

-24M 

-24M 

del 16 

-64M 

-61M 

-58M 

-58M 

-58M 

-58M 

-58M 

luxembourg 

-313M 

-314M 

-310M 

-310M 

-311M 

-310M 

-310M 


btree 

1138_bus 

USpowerG 

3elt 

commanche 

136070 
77 834 

1 123 582 

636240 

1 935 570 

63 721 

44 822 
1016164 

581770 

2051361 

86285 

60 364 
1 065 936 

580001 

1483 059 

87 099 

60631 

1071829 

568 919 

1460357 

87 478 

64 323 
1 073 659 
562 558 
1 446 978 

85916 
62 337 
1098 712 

580314 

1830 875 

85 786 
62786 
1155798 

752 865 
1834976 

bcsstk31 

33M 

31M 

32M 

32M 

32M 

32M 

35M 

fe_pwt 

24M 

22M 

52M 

50M 

51M 

51M 

49M 

del 16 

27M 

50M 

72M 

72M 

72M 

73M 

72M 

luxembourg 

28M 

23M 

50M 

50M 

50M 

52M 

54M 


Table 3. Maxent stress M{x) (top) and full stress F{x) (bottom) for small and medium sized 
graphs. Smaller is better. Values marked with an M are in millions. PMDS and GHN use one 
core/thread, MulMent* use 32 cores (64 threads). Values with an marked with an M are shown in 
million. Recall that GHN and MulMent optimize for M{x). 


graph 

PMDS 

GHN 

MulMento 

MulMenti 

MulMent 2 

MulMenti 

MulMentio 

btree 

0.02 

1.14 

0.10 

0.17 

0.11 

0.11 

0.11 

1138_bus 

0.04 

1.41 

0.15 

0.13 

0.10 

0.11 

0.11 

USpowerG 

0.14 

3.82 

0.29 

0.23 

0.18 

0.15 

0.18 

3elt 

0.16 

3.45 

0.21 

0.19 

0.16 

0.17 

0.17 

commanche 

0.24 

5.42 

0.32 

0.27 

0.18 

0.15 

0.18 

hcsstk31 

3.44 

48.48 

5.63 

3.97 

2.70 

1.75 

1.82 

fe_pwt 

1.49 

31.60 

4.46 

2.67 

1.62 

0.57 

0.66 

del 16 

2.86 

61.42 

13.63 

7.75 

4.38 

1.01 

1.10 

luxembourg 

3.10 

96.10 

40.94 

22.87 

12.53 

1.31 

1.39 

nyc 

9.03 

233.94 

216.27 

119.33 

64.19 

4.68 

3.70 

auto 

41.80 

665.67 

613.51 

329.08 

179.24 

23.64 

20.70 

del20 

53.80 

1125.03 

3303.82 

1749.77 

922.10 

51.35 

27.01 


Table 4. Running times in seconds per graph. Smaller is better. PivotMDS and GHN use one 
thread (sequential codes), the MulMent, algorithms use 32 cores (64 threads). Running times of 
GHN are without the time of PMDS (which yields input coordinates to GHN). 
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graph 

PMDS 

GHN 

MulMentr 

MulMentio 

btree 

0.02 

1.14 

0.07 

0.12 

1138_bus 

0.04 

1.41 

0.10 

0.21 

USpowerG 

0.14 

3.82 

0.30 

1.20 

3elt 

0.16 

3.45 

0.18 

0.70 

commanche 

0.24 

5.42 

0.49 

1.16 

bcsstk31 

3.44 

48.48 

4.41 

7.79 

fe_pwt 

1.49 

31.60 

2.62 

5.92 

del 16 

2.86 

61.42 

6.35 

10.95 

luxembourg 

3.10 

96.10 

16.51 

22.32 

nyc 

9.03 

233.94 

78.34 

53.24 

auto 

41.80 

665.67 

207.37 

118.65 

del20 

53.80 

1125.03 

1101.82 

310.29 


Table 5. Running times in seconds per graph. Smaller is better. All of the algorithms use one 
core. Running times of GHN are without the time of PMDS (which yields input coordinates to 
GHN). 
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graph 

h 

X 

V 

Fa(x) 

Fq{x) dyn 

Fq{x) scratch 

Ma(x) 

Mq{x) dyn 

Mq{x) scratch 

to 

idyn 

^scratch 

bcsstk31 

w 

T 

IT' 

30731K 

31514K 

32 867K 

-14932K 

-13914K 

-13 896K 

5.87 

1.64 

5.73 

bcsstk31 

0 

1 

16 

30731K 

74746K 

71772K 

-14932K 

-7 878K 

-7 975K 

5.94 

3.14 

10.18 

bcsstk31 

0 

5 

2 

30731K 

30680K 

32 271K 

-14932K 

-13 123K 

-13098K 

5.93 

2.43 

6.62 

bcsstk31 

0 

5 

16 

30731K 

77916K 

75 695K 

-14932K 

-6559K 

-6649K 

5.92 

3.25 

14.28 

bcsstk31 

y 

T 


32421K 

33517K 

32 888K 

-14903K 

-13 882K 

-13 894K 

1.86 

1.51 

1.72 

bcsstk31 

7 

1 

16 

32421K 

74908K 

71692K 

-14903K 

-7 869K 

-7 975K 

1.87 

1.61 

1.97 

bcsstk31 

7 

5 

2 

32421K 

32 638K 

31702K 

-14903K 

-13 092K 

-13098K 

1.87 

1.69 

1.90 

bcsstk31 

7 

5 

16 

32421K 

78 223K 

78 906K 

-14903K 

-6 543K 

-6565K 

1.96 

2.10 

2.94 

del 16 

o' 

T 

IT' 

668 857K 

664 866K 

737 076K 

-58 842K 

-58687K 

-57 623K 

14.31 

2.81 

14.04 

del 16 

0 

1 

16 

668 857K 

492192K 

250 399K 

-58 842K 

-48 309K 

-51443K 

14.21 

5.27 

14.45 

del 16 

0 

5 

2 

668 857K 

662 893K 

701 075 K 

-58 842K 

-57 787K 

-57 234K 

14.28 

2.82 

14.07 

del 16 

0 

5 

16 

668 857K 

458453K 

250483K 

-58 842K 

-40 845K 

-43 516K 

14.30 

7.73 

18.88 

del 16 

y 

T 


657 691K 

653 806K 

728 689K 

-58 998K 

-58 841K 

-57 746K 

1.24 

0.68 

1.02 

del 16 

7 

1 

16 

657 691K 

484715K 

250 234K 

-58 998K 

-48 389K 

-51455K 

1.24 

0.74 

1.13 

del 16 

7 

5 

2 

657 691K 

651547K 

715301K 

-58 998K 

-57 946K 

-57015K 

1.16 

0.69 

1.04 

del 16 

7 

5 

16 

657 691K 

451676K 

242099K 

-58 998K 

-40918K 

-43 650K 

1.27 

0.88 

1.28 

fe_pwt 

o' 

T 

IT' 

51015K 

42517K 

40472K 

-23 858K 

-23 283K 

-23 305K 

4.67 

1.00 

4.57 

fe_pwt 

0 

1 

16 

51015K 

30982K 

27 235 K 

-23 858K 

-17 935K 

-17 985K 

4.67 

1.01 

4.59 

fe_pwt 

0 

5 

2 

51015K 

37 268K 

38068K 

-23 858K 

-22 543K 

-22 530K 

4.72 

1.01 

4.59 

fe_pwt 

0 

5 

16 

51015K 

33316K 

29 371K 

-23 858K 

-15 423K 

-15477K 

4.73 

1.78 

4.68 

fe_pwt 

y 

T 


49563K 

41 148K 

38419K 

-23 870K 

-23 295K 

-23 328K 

0.67 

0.41 

0.58 

fe_pwt 

7 

1 

16 

49563K 

30930K 

27 818K 

-23 870K 

-17 932K 

-17 974K 

0.76 

0.42 

0.63 

fe_pwt 

7 

5 

2 

49563K 

36 242K 

36014K 

-23 870K 

-22551K 

-22551K 

0.72 

0.43 

0.64 

fe_pwt 

7 

5 

16 

49563K 

32 843K 

30130K 

-23 870K 

-15 426K 

-15473K 

0.76 

0.52 

0.68 

luxembourg 

y 

T 


511117K 

522212K 

526761K 

-310488K 

-311406K 

-311292K 

42.60 

7.69 

42.31 

luxembourg 

0 

1 

16 

511117K 

549426K 

556099K 

-310488K 

-307121K 

-307 076K 

42.50 

7.71 

42.49 

luxembourg 

0 

5 

2 

511 117K 

788036K 

881315K 

-310488K 

-318330K 

-317828K 

42.60 

7.69 

42.39 

luxembourg 

0 

5 

16 

511 117K 

619716K 

551 834K 

-310488K 

-297 447K 

-298 532K 

42.49 

7.71 

42.57 

luxembourg 

y 

T 


516 248K 

526 999K 

546 575K 

-310400K 

-311324K 

-311045K 

1.37 

0.55 

1.23 

luxembourg 

7 

1 

16 

516 248K 

551360K 

592054K 

-310400K 

-307 071K 

-306588K 

1.48 

0.56 

1.31 

luxembourg 

7 

5 

2 

516248K 

789014K 

883 948K 

-310400K 

-318323K 

-317 740K 

1.38 

0.53 

1.26 

luxembourg 

7 

5 

16 

516 248K 

617187K 

552312K 

-310400K 

-297 457K 

-298 503K 

1.35 

0.58 

1.29 


Table 6. Detailed per instances results for dynamic graph experiments. Smaller values are better. 
Values are given in thousands. F* measures full stress, M* measures maxent-stress, dyn refers to 
the algorithm that updates the given coordinates, scratch to the algorithm that discards the given 
coordinates and updates the layout. Running times are given in seconds. Values marked with a K 
are given in thousand. 
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E Additional Drawings 





Fig. 7. Drawings of the largest connected components of commanche (LHS) and 3elt (RHS). 
From top to bottom: PMDS, MaxEnt, MulMent 
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